78 research outputs found

    A scalable learning algorithm for Kernel Probabilistic Classifier

    Get PDF
    National audienceIn this paper we propose a probabilistic classification algorithm that learns a set of kernel functions that associate a probability distribution over classes to an input vector. This model is obtained by maximizing a measure over the probability distributions through a local optimization process. This measure focuses on the faithfulness of the whole probability distribution induced rather than only considering the probabilities of the classes separately. We show that, thanks to a pre-processing computation, the complexity of the evaluation of this measure with respect to a model is no longer dependent on the size of the training set. This makes the local optimization of the whole set of kernel functions tractable, even for large databases. We experiment our method on five benchmark datasets and the KDD Cup 2012 dataset

    An informational distance for estimating the faithfulness of a possibility distribution, viewed as a family of probability distributions, with respect to data

    Get PDF
    International audienceAn acknowledged interpretation of possibility distributions in quantitative possibility theory is in terms of families of probabilities that are upper and lower bounded by the associated possibility and necessity measures. This paper proposes an informational distance function for possibility distributions that agrees with the above-mentioned view of possibility theory in the continuous and in the discrete cases. Especially, we show that, given a set of data following a probability distribution, the optimal possibility distribution with respect to our informational distance is the distribution obtained as the result of the probability-possibility transformation that agrees with the maximal specificity principle. It is also shown that when the optimal distribution is not available due to representation bias, maximizing this possibilistic informational distance provides more faithful results than approximating the probability distribution and then applying the probability-possibility transformation. We show that maximizing the possibilistic informational distance is equivalent to minimizing the squared distance to the unknown optimal possibility distribution. Two advantages of the proposed informational distance function is that (i) it does not require the knowledge of the shape of the probability distribution that underlies the data, and (ii) it amounts to sum up the elementary terms corresponding to the informational distance between the considered possibility distribution and each piece of data. We detail the particular case of triangular and trapezoidal possibility distributions and we show that any unimodal unknown probability distribution can be faithfully upper approximated by a triangular distribution obtained by optimizing the possibilistic informational distance

    From Visualization to Association Rules : an automatic approach

    Get PDF
    International audienceThe main goal of Data Mining is the research of relevant information from a huge volume of data. It is generally achieved either by automatic algorithms or by the visual exploration of data. Thanks to algorithms, an exhaustive set of patterns matching specific measures can be found. But the volume of extracted information can be greater than the volume of initial data. Visual Data Mining allows the specialist to focus on a specific area of data that may describe interesting patterns. However, it is often limited by the difficulty to deal with a great number of multi dimensional data. In this paper, we propose to mix an automatic and a manual method, by driving the automatic extraction using a data scatter plot visualization. This visualization affects the number of rules found and their construction. We illustrate our method on two databases. The first describes one month French air traffic and the second stems from 2012 KDD Cup database

    Utilisation d'outils de Visual Data Mining pour l'exploration d'un ensemble de règles d'association

    Get PDF
    International audienceData Mining aims at extracting maximum of knowledge from huge databases. It is realized by an automatic process or by data visual exploration with interactive tools. Automatic data mining extracts all the patterns which match a set of metrics. The limit of such algorithms is the amount of extracted data which can be larger than the initial data volume. In this article, we focus on association rules extraction with Apriori algorithm. After the description of a characterization model of a set of association rules, we propose to explore the results of a Data Mining algorithm with an interactive visual tool. There are two advantages. First it will visualize the results of the algorithms from different points of view (metrics, rules attributes). Then it allows us to select easily inside large set of rules the most relevant ones

    Mining aeronautical data by using visualized driven rules extraction approach

    Get PDF
    International audienceData Mining aims at researching relevant information from a huge volume of data. It can be automatic thanks to algorithms, or manual, for instance by using visual exploration tools. An algorithm finds an exhaustive set of patterns matching specific measures. But, depending on measures thresholds, the volume of extracted information can be greater than the volume of initial data. The second approach is Visual Data Mining which helps the specialist to focus on specific areas of data that may describe interesting patterns. However it is generally limited by the difficulty to tackle a great number of multi dimensional data. In this paper, we propose both methods, by combining the use of algorithms with manual visual data mining. From a scatter plot visualization, an algorithm generates association rules, depending on the visual variables assignments. Thus they have a direct effect on the construction of the found rules. Then we characterize the visualization with the extracted association rules in order to show the involvement of the data in the rules, and then which data can be used for predictions. We illustrate our method on two databases. The first describes one month French air traffic and the second stems from a FAA database about delays and cancellations causes

    Increasing Air Traffic Control simulations realism through voice transformation

    Get PDF
    International audienceImproving realism in simulations is a critical issue. In some air traffic control (ATC) simulations we use a pseudo-pilot which pilots up to fifteen aircraft. Thus, having the same voice for different aircraft in the case of pseudo-pilot decreases the realism of the simulation and may be confusing for the controllers especially in study context. In research context, a virtual aircraft piloted in a flight simulator is sometime needed in addition to the pseudo pilot. For simulation needs, the flight simulator aircraft must be merged with pseudo-pilot's one. This is not possible without voice modification since the controller can distinguish the pilot voice. In this paper we propose a method for transforming the voices of the pilot and the pseudo-pilot in order to have one particular voice and cabin noise for each aircraft. The two experiments that have been conducted show that, through our voice modification algorithm, the realism of the simulation is enhanced and the voice biases disappear

    Representing uncertainty by possibility distributions encoding confidence bands, tolerance and prediction intervals

    Get PDF
    For a given sample set, there are already different methods for building possibility distributions encoding the family of probability distributions that may have generated the sample set. Almost all the existing methods are based on parametric and distribution free confidence bands. In this work, we introduce some new possibility distributions which encode different kinds of uncertainties not treated before. Our possibility distributions encode statistical tolerance and prediction intervals (regions). We also propose a possibility distribution encoding the confidence band of the normal distribution which improves the existing one for all sample sizes. In this work we keep the idea of building possibility distributions based on intervals which are among the smallest intervals for small sample sizes. We also discuss the properties of the mentioned possibility distributions

    Building possibility distribution based on confidence intervals of parameters of Gaussian mixtures

    Get PDF
    International audienceIn parametric methods, building a probability distribution from data requires an a priori knowledge about the shape of the distribution. Once the shape is known, we can estimate the optimal parameters value from the data set. However, there is always a gap between the estimated parameters from the sample sets and true parameters, and this gap depends on the number of observations. Even if an exact estimation of parameters values might not be performed, confidence intervals for these parameters can be built. One interpretation of the quantitative possibility theory is in terms of families of probabilities that are upper and lower bounded by the associated possibility and necessity measure. In this paper, we assume that the data follow a Gaussian distribution, or a mixture of Gaussian distributions. We propose to use confidence interval parameters (computed from a sample set of data) in order to build a possibility distribution that upper approximate the family of probability distributions whose parameters are in the confidence intervals. Starting from the case of a single Gaussian distribution, we extend our approach to the case of Gaussian mixture models

    Génération et placement de couleurs sur une vue de type métro

    Get PDF
    International audienceThe schematic views for metro maps are used to maximize the transmision of relevant information (lines, metro stops) of network visualization. Automatic generation of metro maps focus primarily on the physical structure of the network, but little on the choice of colors which is an accurate visual discrimination. In this article, we propose to invesigate the generation and placement of colors to be assigned to lines of a network. The first step is to find as many colors as lines of the network. These colors must be perceptually as distant as possible, and available in the vocabulary of colors. The second step is to place these colors so that the closest lines have the more distant color. The positioning of colors is a NP-complete problem, thus we use a meta-heuristic approach to solve it. To validate our method, we apply it to the field of air traffic control with the maps of Flight Routes
    • …
    corecore